Inline Code
Volume Number: 1
Issue Number: 9
Column Tag: Forth Forum
"Inline Code for MacForth
By Jörg Langowski, Chemical Engineer, Fed. Rep. of Germany, MacTutor
Editorial Board
Speeding up Forth with Inline Code
When you use your computer for applications that require a lot of data shuffling
and calculations, work with large arrays and matrices and so on, you tend to become a
little paranoid about speed. Although Forth code is very compact through its threaded
structure, and word execution (i.e. subroutine calling) is reasonably well optimized in
MacForth (see MacTutor V1 No2), I have always felt uncomfortable with the overhead
that goes into the execution of a simple word like DROP, whose 'active part' consists of
one 16-bit word of machine code.
Just as a reminder: when the Forth em executes the token for DROP in a
definition, it calls a subroutine that looks like this:
DROP ADDQ.L #4,A7
JMP (A4)
So it is a simple 4-byte increment of the stack pointer that does the DROP job.
But, then the next token has to be fetched and executed by jumping to the NEXT routine,
whose address is contained in A4, the base pointer. This makes for a several hundred
precent overhead, as compared to the increment itself. This overhead is not so dramatic
with other words, but it is still there: and all in all the Sieve benchmark needs 21
seconds to run in MacForth, compare this to 9 seconds in compiled C (Consulair).
How can we speed up the code? After all, we have complete control over what goes
into the dictionary and could put the machine code that we need right in there, no need
for time-expensive subroutine calling. This is what the Forth 2.0 assembler enables
you to do. However, if you create a piece of code in Forth assembler, it tends to look
much more cryptic than 'normal' assembler, which after all is readable with adequate
documentation.
It would be much nicer if we had a means to create the assembly code that
corresponds to a DROP by writing a similar word, such as %DROP: something like a
macro. No need to worry about which registers to use, and you could use 'almost
normal' Forth code for writing your routine.
It shouldn't be that difficult to persuade the Forth system to execute machine code
that is embedded in a definition. Every Forth word starts with at least one executable
piece of machine code, trap calls for Forth-defined words such as colon definitions and
'real' 68000 code for machine code definitions. However, this gives you either
machine code or Forth, not both. Our goal is to define words that allow switching
between 68000 and Forth code within one definition. Similar words do exist in the
Forth 2.0 assembler, but it lacks a set of macros that allow you to write inline Forth
code instead of assembly code. Furthermore, you cannot define control structures that
easily.
Assume we have Forth code that looks like this:
...
...
etc. This sequence of instruction will get executed just fine if is a word that
transfers execution to the word just following. We'll call this word >CODE and define it
as follows:
: >CODE
here 2+ make.token w, [compile] [ ;
immediate
This word, which is executed during compilation, takes the next free address in
the dictionary, adds 2 (this is where execution of the machine code is to start) and
compiles this address as a token into the dictionary. Since a token just tells the Forth
interpreter 'jump to the address that I refer to', machine code execution will start at
the address following >CODE.
This is what happens at execution time. At compilation time, the words following
>CODE in the input stream are executed, not compiled (this is what the [COMPILE] [
does). Therefore, if the words following >CODE are macros that stuff assembly code
into the dictionary, you have your inline code right there.
We'll get to those macros in a minute. First, what remains is the problem how to
get out of the machine code. You might recall that all machine-level Forth definitions
finish with a
JMP (A4)
and the NEXT routine, pointed to by A4, gets the next token from the Forth code.
The pointer to the next token is in register A3. Unfortunately, after we executed
>CODE, A3 remained unchanged and still points to the word following the >CODE token.
Which is 68000 code and certainly nothing that the interpreter will swallow.
Therefore we have to reset A3 before we jump back into the Forth interpreter. This is
what the word >FORTH does:
: >FORTH 47fa0004 , 4ed4 w, [compile] ] ;
LEA 4(PC),A3
JMP (A4)
Remember, when >FORTH appears in the input stream, we are still in execution
mode, from the preceding >CODE (unless we mixed things up). So >FORTH gets executed
when used in a definition; it assembles code that loads A3 with the address following the
JMP, then executes the JMP. Then the mode is switched back to regular Forth
compilation again.
Between >CODE and >FORTH we can now place our macros that generate inline
machine code corresponding to Forth primitives. The code for any of the primitives is